Classification cost: An empirical comparison among traditional classifier, Cost-Sensitive Classifier, and MetaCost

نویسندگان

  • Jungeun Kim
  • Keunho Choi
  • Gunwoo Kim
  • Yongmoo Suh
چکیده

Loan fraud is a critical factor in the insolvency of financial institutions, so companies make an effort to reduce the loss from fraud by building a model for proactive fraud prediction. However, there are still two critical problems to be resolved for the fraud detection: (1) the lack of cost sensitivity between type I error and type II error in most prediction models, and (2) highly skewed distribution of class in the dataset used for fraud detection because of sparse fraud-related data. The objective of this paper is to examine whether classification cost is affected both by the cost-sensitive approach and by skewed distribution of class. To that end, we compare the classification cost incurred by a traditional cost-insensitive classification approach and two cost-sensitive classification approaches, Cost-Sensitive Classifier (CSC) and MetaCost. Experiments were conducted with a credit loan dataset from a major financial institution in Korea, while varying the distribution of class in the dataset and the number of input variables. The experiments showed that the lowest classification cost was incurred when the MetaCost approach was used and when non-fraud data and fraud data were balanced. In addition, the dataset that includes all delinquency variables was shown to be most effective on reducing the classification cost. 2011 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Predictive Model for Myocardial Infarction Using Cost-sensitive J48 Model

BACKGROUND Myocardial infarction (MI) occurs due to heart muscle death that costs like human life, which is higher than the treatment costs. This study aimed to present an MI prediction model using classification data mining methods, which consider the imbalance nature of the problem. METHODS We enrolled 455 healthy and 295 myocardial infarction cases of visitors to Shahid Madani Specialized ...

متن کامل

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...

متن کامل

Optimizing a Cost Matrix to Solve Rare-Class Biological Problems

In a binary dataset, a rare-class problem occurs when one class of data (typically the class of interest) is far outweighed by the other. Such a problem is typically difficult to learn and classify and is quite common, especially among biological problems such as the identification of gene conversions. A multitude of solutions for this problem exist with varying levels of success. In this paper...

متن کامل

A New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate

Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...

متن کامل

Metacost: a General Method for Making Classiiers Cost-sensitive

Research in machine learning, statistics and related elds has produced a wide variety of algorithms for classiication. However, most of these algorithms assume that all errors have the same cost, which is seldom the case in KDD problems. Individually making each classiication learner cost-sensitive is laborious, and often non-trivial. In this paper we propose a principled method for making an a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 39  شماره 

صفحات  -

تاریخ انتشار 2012